Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency

نویسندگان

  • Ekapol Chuangsuwanich
  • James R. Glass
چکیده

The task of robustly detecting distant speech in low SNR environments for automatic speech recognition is examined using a two-stage approach based on two distinguishing features of speech, namely harmonicity and modulation frequency (MF). A modified metric for harmonicity is used as a gating function to a set of parallel classifiers that incorporate MFs computed on different frequency bands. Performance is evaluated on both the frame-level discriminative power and also the system level ASR results on a real-world robotic forklift task. Compared to other previously proposed features such as relative spectral entropy, and classification strategies involving MFs, the combined approach shows good generalization across different kinds of dynamic noise conditions, and obtains a significant improvement on the false alarm rate at low speech miss rate settings. The overall ASR results also improved significantly compared to the ESTI AMR-VAD2, while reducing the number of false alarms by a factor of two.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental acoustic subspace learning for voice activity detection using harmonicity-based features

This paper presents novel voice activity detection (VAD) approach based on incremental subspace learning using harmonicity-based features. Harmonic structure is well known as noise robust speech feature. We develop novel harmonicitybased feature based on temporal-spectral co-occurrence patterns. At statistical decision stage, many conventional statistical VAD methods rely on Gaussian model; how...

متن کامل

A low-complexity voice activity detector for smart hearing protection of hyperacusic persons

In this paper, a Voice Activity Detector (VAD) is proposed for smart hearing protection applications where speech is to get through the hearing protector while ambient noise is to be blocked out. The VAD calculates a short-term statistical assessment of the temporal envelopes within different frequency bands. This assessment uses the Inter-Quartile Range (IQR) and reflects the dispersion of the...

متن کامل

New harmonicity measures for pitch estimation and voice activity detection

Harmonic structure can be easily recognized in the timefrequency representation of speech signals even in the diverse environment. The harmonicity is a measure of the completeness of harmonic structure. This paper extends the use of conventional harmonicity measure to the tasks of pitch estimation and voice activity detection. A set of hierarchical harmonicities, including grid, temporal, spect...

متن کامل

Speech event detection using multiband modulation energy

The need for efficient, sophisticated features for speech event detection is inherent in state of the art processing, enhancement and recognition systems. We explore ideas and techniques from non-linear speech modeling and analysis, like modulations and multiband filtering and propose new energy and spectral content features derived through filtering in multiple frequency bands and tracking dom...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011